186 research outputs found
Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets
Gene annotation databases (compendiums maintained by the scientific community
that describe the biological functions performed by individual genes) are
commonly used to evaluate the functional properties of experimentally derived
gene sets. Overlap statistics, such as Fisher's Exact Test (FET), are often
employed to assess these associations, but don't account for non-uniformity in
the number of genes annotated to individual functions or the number of
functions associated with individual genes. We find FET is strongly biased
toward over-estimating overlap significance if a gene set has an unusually high
number of annotations. To correct for these biases, we develop Annotation
Enrichment Analysis (AEA), which properly accounts for the non-uniformity of
annotations. We show that AEA is able to identify biologically meaningful
functional enrichments that are obscured by numerous false-positive enrichment
scores in FET, and we therefore suggest it be used to more accurately assess
the biological properties of gene sets
High Performance Computing of Gene Regulatory Networks using a Message-Passing Model
Gene regulatory network reconstruction is a fundamental problem in
computational biology. We recently developed an algorithm, called PANDA
(Passing Attributes Between Networks for Data Assimilation), that integrates
multiple sources of 'omics data and estimates regulatory network models. This
approach was initially implemented in the C++ programming language and has
since been applied to a number of biological systems. In our current research
we are beginning to expand the algorithm to incorporate larger and most diverse
data-sets, to reconstruct networks that contain increasing numbers of elements,
and to build not only single network models, but sets of networks. In order to
accomplish these "Big Data" applications, it has become critical that we
increase the computational efficiency of the PANDA implementation. In this
paper we show how to recast PANDA's similarity equations as matrix operations.
This allows us to implement a highly readable version of the algorithm using
the MATLAB/Octave programming language. We find that the resulting M-code much
shorter (103 compared to 1128 lines) and more easily modifiable for potential
future applications. The new implementation also runs significantly faster,
with increasing efficiency as the network models increase in size. Tests
comparing the C-code and M-code versions of PANDA demonstrate that this
speed-up is on the order of 20-80 times faster for networks of similar
dimensions to those we find in current biological applications
Patterns and Complexity in Biological Systems: A Study of Sequence Structure and Ontology-based Networks
Biological information can be explored at many different levels, with the most basic information encoded in patterns within the DNA sequence. Through molecular level processes, these patterns are capable of controlling the states of genes, resulting in a complex network of interactions between genes. Key features of biological systems can be determined by evaluating properties of this gene regulatory network. More specifically, a network-based approach helps us to understand how the collective behavior of genes corresponds to patterns in genetic function.
We combine Chromatin-Immunoprecipitation microarray (ChIP-chip) data with genomic sequence data to determine how DNA sequence works to recruit various proteins. We quantify this information using a value termed "nmer-association.'' "Nmer-association'' measures how strongly individual DNA sequences are associated with a protein in a given ChIP-chip experiment. We also develop the "split-motif'' algorithm to study the underlying structural properties of DNA sequence independent of wet-lab data. The "split-motif'' algorithm finds pairs of DNA motifs which preferentially localize relative to one another. These pairs are primarily composed of known transcription factor binding sites and their co-occurrence is indicative of higher-order structure. This kind of structure has largely been missed in standard motif-finding algorithms despite emerging evidence of the importance of complex regulation.
In both simple and complex regulation, two genes that are connected in a regulatory fashion are likely to have shared functions. The Gene Ontology (GO) provides biologists with a controlled terminology with which to describe how genes are associated with function and how those functional terms are related to each other. We introduce a method for processing functional information in GO to produce a gene network. We find that the edges in this network are correlated with known regulatory interactions and that the strength of the functional relationship between two genes can be used as an indicator of how informationally important that link is in the regulatory network. We also investigate the network structure of gene-term annotations found in GO and use these associations to establish an alternate natural way to group the functional terms. These groups of terms are drastically different from the hierarchical structure established by the Gene Ontology and provide an alternative framework with which to describe and predict the functions of experimentally identified groups of genes
Recommended from our members
Parental attachment as a predictor of sexual, physical, and emotional abuse revictimization
Explores why revictimization occurs in women who were sexually abused as children. Examines variables such as nature and severity of childhood abuse, attachment, and self-esteem to identify predictors of repeated abuse. A correlational-regression approach was used to test the hypothesis that lower positive attachment to parental figures, mediated by low self-esteem, will be associated with revictimization in adulthood. Approximately 150 women (Age = 18 to 54; M = 27) from various communities across Southern California participated in the study. Results did not support the hypothesis. Though self-esteem was correlated with both attachment and revictimization individually, there was no mediational effect of self-esteem between parental attachment and revictimization
Estimating sample-specific regulatory networks
Biological systems are driven by intricate interactions among the complex
array of molecules that comprise the cell. Many methods have been developed to
reconstruct network models of those interactions. These methods often draw on
large numbers of samples with measured gene expression profiles to infer
connections between genes (or gene products). The result is an aggregate
network model representing a single estimate for the likelihood of each
interaction, or "edge," in the network. While informative, aggregate models
fail to capture the heterogeneity that is represented in any population. Here
we propose a method to reverse engineer sample-specific networks from aggregate
network models. We demonstrate the accuracy and applicability of our approach
in several data sets, including simulated data, microarray expression data from
synchronized yeast cells, and RNA-seq data collected from human lymphoblastoid
cell lines. We show that these sample-specific networks can be used to study
changes in network topology across time and to characterize shifts in gene
regulation that may not be apparent in expression data. We believe the ability
to generate sample-specific networks will greatly facilitate the application of
network methods to the increasingly large, complex, and heterogeneous
multi-omic data sets that are currently being generated, and ultimately support
the emerging field of precision network medicine
Passing Messages between Biological Networks to Refine Predicted Interactions
Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net
Recommended from our members
Combinatorial Recruitment of CREB, C/EBPĪ² and c-Jun Determines Activation of Promoters upon Keratinocyte Differentiation
Background: Transcription factors CREB, C/EBPĪ² and Jun regulate genes involved in keratinocyte proliferation and differentiation. We questioned if specific combinations of CREB, C/EBPĪ² and c-Jun bound to promoters correlate with RNA polymerase II binding, mRNA transcript levels and methylation of promoters in proliferating and differentiating keratinocytes. Results: Induction of mRNA and RNA polymerase II by differentiation is highest when promoters are bound by C/EBP Ī² alone, C/EBPĪ² together with c-Jun, or by CREB, C/EBPĪ² and c-Jun, although in this case CREB binds with low affinity. In contrast, RNA polymerase II binding and mRNA levels change the least upon differentiation when promoters are bound by CREB either alone or in combination with C/EBPĪ² or c-Jun. Notably, promoters bound by CREB have relatively high levels of RNA polymerase II binding irrespective of differentiation. Inhibition of C/EBPĪ² or c-Jun preferentially represses mRNA when gene promoters are bound by corresponding transcription factors and not CREB. Methylated promoters have relatively low CREB binding and, accordingly, those which are bound by C/EBPĪ² are induced by differentiation irrespective of CREB. Composite āHalf and Halfā consensus motifs and co localizing consensus DNA binding motifs are overrepresented in promoters bound by the combination of corresponding transcription factors. Conclusion: Correlational and functional data describes combinatorial mechanisms regulating the activation of promoters. Colocalization of C/EBPĪ² and c-Jun on promoters without strong CREB binding determines high probability of activation upon keratinocyte differentiation
- ā¦